883,458 research outputs found

    Information extraction from template-generated hidden web documents

    Get PDF
    The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose search engines (such as Google and Yahoo). Databases dynamically generate a list of documents in response to a user query – which are referred to as Hidden Web databases. Such documents are typically presented to users as templategenerated Web pages. This paper presents a new approach that identifies Web page templates in order to extract queryrelated information from documents. We propose two forms of representation to analyse the content of a document – Text with Immediate Adjacent Tag Segments (TIATS) and Text with Neighbouring Adjacent Tag Segments (TNATS). Our techniques exploit tag structures that surround the textual contents of documents in order to detect Web page templates thereby extracting query-related information. Experimental results demonstrate that TNATS detects Web page templates most effectively and extracts information with high recall and precision

    Organizing information on the next generation web - Design and implementation of a new bookmark structure

    Get PDF
    The next-generation Web will increase the need for a highly organized and ever evolving method to store references to Web objects. These requirements could be realized by the development of a new bookmark structure. This paper endeavors to identify the key requirements of such a bookmark, specifically in relation to Web documents, and sets out a suggested design through which these needs may be accomplished. A prototype developed offers such features as the sharing of bookmarks between users and groups of users. Bookmarks for Web documents in this prototype allow more specific information to be stored such as: URL, the document type, the document title, keywords, a summary, user annotations, date added, date last visited and date last modified. Individuals may access the service from anywhere on the Internet, as long as they have a Java-enabled Web browser

    A granular approach to web search result presentation

    Get PDF
    In this paper we propose and evaluate interfaces for presenting the results of web searches. Sentences, taken from the top retrieved documents, are used as fine-grained representations of document content and, when combined in a ranked list, to provide a query-specific overview of the set of retrieved documents. Current search engine interfaces assume users examine such results document-by-document. In contrast our approach groups, ranks and presents the contents of the top ranked document set. We evaluate our hypotheses that the use of such an approach can lead to more effective web searching and to increased user satisfaction. Our evaluation, with real users and different types of information seeking scenario, showed, with statistical significance, that these hypotheses hold

    IVOA Recommendation: IVOA Support Interfaces

    Full text link
    This document describes the minimum interface that a (SOAP- or REST-based) web service requires to participate in the IVOA. Note that this is not required of standard VO services developed prior to this specification, although uptake is strongly encouraged on any subsequent revision. All new standard VO services, however, must feature a VOSI-compliant interface. This document has been produced by the Grid and Web Services Working Group. It has been reviewed by IVOA Members and other interested parties, and has been endorsed by the IVOA Executive Committee as an IVOA Recommendation. It is a stable document and may be used as reference material or cited as a normative reference from another document. IVOA's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability inside the Astronomical Community

    Libarcclient A Client Library for ARC

    Get PDF
    This document describes from a technical viewpoint a plugin-based client library for the new Web Service (WS) based Advanced Resource Connector (ARC) middlewar

    A Query Integrator and Manager for the Query Web

    Get PDF
    We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions
    corecore